-
-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gix-protocol::fetch()
without Delegates
#1634
Conversation
d1717c3
to
c7d477d
Compare
b5071ed
to
437be4f
Compare
This is a first step towards creating a multi-step `fetch()` which can also be used in the `gix` crate.
842b0de
to
42bc48e
Compare
…n error. This makes the system more reliable, particularly when certain operations are performed on Drop, and can happen prematurely.
203e91e
to
6e506ca
Compare
That way it's easier to reuse shallow-handling code from plumbing crates. Note that this is a breaking change as `gix-protocol` now uses the `gix-shallow::Update` type, which doesn't implement a formerly public `from_line()` method anymore. Now it is available as `fetch::response::shallow_update_from_line()`.
29950a7
to
e9e1553
Compare
For now, just move the code down and immediately re-integrate in `gix` to be able to use its tests to validate it. This is a breaking change as some types move and change the layout.
00d866f
to
5f11cc7
Compare
@@ -107,7 +107,9 @@ title "gix (with repository)" | |||
) | |||
fi | |||
|
|||
(with "git:// protocol" | |||
# for some reason, on CI the daemon always shuts down before we can connect, | |||
# or isn't actually ready despite having accepted the first connection already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EliahKagan Just to share, this is an oddity that might just be coincidence, or it is indeed very subtly related to the code change here.
What happens is that on CI it can never connect to the Git daemon that it just started. The script only proceeds once the first connection was established using nc
. For wall I can tell, the server is down and doesn't accept connections at all, when a connection is attempted. The code doing that didn't even change. Only the shutdown of the connection (with EOF packet being sent) happens on Drop instead of explicitly. But for all I can tell, it doesn't even get there. And the first connection attempt already fails, so it's not that a previous connection somehow messed up the daemon.
It's very strange.
Locally, it would sometimes reproduce, but also sometimes it could work for 10 times in a row.
It seems the Git daemon doesn't work on CI at all anymore.
With debugging, the daemon prints this on success:
it-daemon
19:45:49.427373 exec-cmd.c:238 trace: resolved executable dir: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core
[86841] Connection from [::1]:64256
fatal: the remote end hung up unexpectedly
[86839] [86841] Disconnected (with error)
19:45:49.448501 run-command.c:655 trace: run_command: REMOTE_ADDR='[::1]' REMOTE_PORT=64257 /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git-daemon --serve --verbose --base-path=. --export-all --user-path
19:45:49.451073 exec-cmd.c:139 trace: resolved executable path from Darwin stack: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git-daemon
19:45:49.451788 exec-cmd.c:238 trace: resolved executable dir: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core
[86853] Connection from [::1]:64257
19:45:49.451957 pkt-line.c:80 packet: git< git-upload-pack /\0host=localhost\0
[86853] Extended attribute "host": localhost
[86853] Request upload-pack for '/'
19:45:49.452171 run-command.c:655 trace: run_command: git upload-pack --strict --timeout=0 .
[86853] 19:45:49.454781 exec-cmd.c:139 trace: resolved executable path from Darwin stack: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git
[86853] 19:45:49.455688 exec-cmd.c:238 trace: resolved executable dir: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core
[86853] 19:45:49.455955 git.c:460 trace: built-in: git upload-pack --strict --timeout=0 .
[86853] 19:45:49.456947 pkt-line.c:80 packet: upload-pack> 3f72b39ad1600e6dac63430c15e0d875e9d3f9d6 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/main object-format=sha1 agent=git/2.39.5.(Apple.Git-154)
[86853] 19:45:49.458105 pkt-line.c:80 packet: upload-pack> ee3c97678e89db4eab7420b04aef51758359f152 refs/heads/dev
[86853] 19:45:49.458150 pkt-line.c:80 packet: upload-pack> 3f72b39ad1600e6dac63430c15e0d875e9d3f9d6 refs/heads/main
[86853] 19:45:49.458153 pkt-line.c:80 packet: upload-pack> feae03400632392a7f38e5b2775f98a439f5eaf5 refs/tags/annotated
[86853] 19:45:49.458245 pkt-line.c:80 packet: upload-pack> ee3c97678e89db4eab7420b04aef51758359f152 refs/tags/annotated^{}
[86853] 19:45:49.458248 pkt-line.c:80 packet: upload-pack> efa596d621559707b2d221f10490959b2decbc6c refs/tags/unannotated
[86853] 19:45:49.458342 pkt-line.c:80 packet: upload-pack> 0000
[86853] 19:45:49.459051 pkt-line.c:80 packet: upload-pack< 0000
[86853] 19:45:49.459055 trace.c:411 performance: 0.003111000 s: git command: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git upload-pack --strict --timeout=0 .
[86839] [86853] Disconnected
[with] version 2
[it] generates the correct output19:45:49.483351 run-command.c:655 trace: run_command: REMOTE_ADDR='[::1]' REMOTE_PORT=64258 /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git-daemon --serve --verbose --base-path=. --export-all --user-path
19:45:49.485905 exec-cmd.c:139 trace: resolved executable path from Darwin stack: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git-daemon
19:45:49.486612 exec-cmd.c:238 trace: resolved executable dir: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core
[86868] Connection from [::1]:64258
19:45:49.486782 pkt-line.c:80 packet: git< git-upload-pack /\0host=localhost\0\0version=2\0
[86868] Extended attribute "host": localhost
[86868] Extended attribute "protocol": version=2
[86868] Request upload-pack for '/'
19:45:49.487010 run-command.c:655 trace: run_command: GIT_PROTOCOL=version=2 git upload-pack --strict --timeout=0 .
[86868] 19:45:49.489777 exec-cmd.c:139 trace: resolved executable path from Darwin stack: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git
[86868] 19:45:49.490572 exec-cmd.c:238 trace: resolved executable dir: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core
[86868] 19:45:49.490861 git.c:460 trace: built-in: git upload-pack --strict --timeout=0 .
[86868] 19:45:49.490985 pkt-line.c:80 packet: upload-pack> version 2
[86868] 19:45:49.491008 pkt-line.c:80 packet: upload-pack> agent=git/2.39.5.(Apple.Git-154)
[86868] 19:45:49.491234 pkt-line.c:80 packet: upload-pack> ls-refs=unborn
[86868] 19:45:49.491257 pkt-line.c:80 packet: upload-pack> fetch=shallow wait-for-done ref-in-want
[86868] 19:45:49.491260 pkt-line.c:80 packet: upload-pack> server-option
[86868] 19:45:49.491263 pkt-line.c:80 packet: upload-pack> object-format=sha1
[86868] 19:45:49.491265 pkt-line.c:80 packet: upload-pack> object-info
[86868] 19:45:49.491267 pkt-line.c:80 packet: upload-pack> 0000
[86868] 19:45:49.491621 pkt-line.c:80 packet: upload-pack< command=ls-refs
[86868] 19:45:49.491625 pkt-line.c:80 packet: upload-pack< agent=git/oxide-0.68.0
[86868] 19:45:49.491626 pkt-line.c:80 packet: upload-pack< 0001
[86868] 19:45:49.491628 pkt-line.c:80 packet: upload-pack< symrefs
[86868] 19:45:49.491633 pkt-line.c:80 packet: upload-pack< peel
[86868] 19:45:49.491634 pkt-line.c:80 packet: upload-pack< unborn
[86868] 19:45:49.491636 pkt-line.c:80 packet: upload-pack< 0000
[86868] 19:45:49.492175 pkt-line.c:80 packet: upload-pack> 3f72b39ad1600e6dac63430c15e0d875e9d3f9d6 HEAD symref-target:refs/heads/main
[86868] 19:45:49.492607 pkt-line.c:80 packet: upload-pack> ee3c97678e89db4eab7420b04aef51758359f152 refs/heads/dev
[86868] 19:45:49.492610 pkt-line.c:80 packet: upload-pack> 3f72b39ad1600e6dac63430c15e0d875e9d3f9d6 refs/heads/main
[86868] 19:45:49.492711 pkt-line.c:80 packet: upload-pack> feae03400632392a7f38e5b2775f98a439f5eaf5 refs/tags/annotated peeled:ee3c97678e89db4eab7420b04aef51758359f152
[86868] 19:45:49.492760 pkt-line.c:80 packet: upload-pack> efa596d621559707b2d221f10490959b2decbc6c refs/tags/unannotated
[86868] 19:45:49.492763 pkt-line.c:80 packet: upload-pack> 0000
[86868] 19:45:49.492979 pkt-line.c:80 packet: upload-pack< 0000
[86868] 19:45:49.493028 trace.c:411 performance: 0.002177000 s: git command: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git upload-pack --strict --timeout=0 .
[86839] [86868] Disconnected
I diffed that output (cutting away the first three columns) in comparison with a version from main
, and it was the same (only ports were different).
On failure, there was no output that would say anything - no connection was established except for the first one to test if the server is up.
And now there is the next failure, this time with code that fetches a pack.
Hmm… for now I feel forced to disable these tests, but can't shake the feeling that this is telling something 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it fails, could one thing be using IPV4 127.0.0.1
and another thing be using IPv6 ::1
?
This seems like a long shot and is somewhat conceptually unformed ("one/another thing"). But I may need to go away from my computer for a while, so I figured I'd mention my one idea so far, in case somehow it turns out to be relevant and could keep the test from having to be turned off on CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the error message I think it is clear that the Git daemon socket isn't open anymore, and that before the first 'real' connection. Maybe there is merit in getting details on the error code at least. stdout/stderr would be visible otherwise, so it would be a natural exit maybe.
Very strange still, also locally it's pretty stable, albeit not perfect, but on CI the failure rate is basically 100% now.
I will forcefully push past this for now, and hope it's truly something related to the Git daemon itself. Maybe it's just timing and main
(pre-merge) would also be broken now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the error message I think it is clear that the Git daemon socket isn't open anymore, and that before the first 'real' connection.
Ah, okay.
Very strange still, also locally it's pretty stable, albeit not perfect, but on CI the failure rate is basically 100% now.
I will forcefully push past this for now, and hope it's truly something related to the Git daemon itself. Maybe it's just timing and
main
(pre-merge) would also be broken now?
I reran CI in my fork at faa0cde, the tip of main before this PR was merged. All tests passed.
On CI, the journey tests run only on Ubuntu, because they are run from the CI test
job, which only runs on ubuntu-latest
. Is it possible that what makes a difference is whether they are run on macOS versus GNU/Linux? (My guess is that you are most often running tests on macOS when you run them locally, and also the output you showed seems to have been from a macOS system.) Could they be usually but not always passing on macOS, while rarely passing on GNU/Linux?
Another difference, in comparing ubuntu-latest
CI runs to local macOS runs, is in versions of Git, since ubuntu-latest
usually has the latest stable version of Git that was available when the most recent production ubuntu-latest
GHA image was built (which is usually recently).
These things, at least the question of whether local GNU/Linux runs succeed, seems like the kind of thing I could just check... and I suspect I will be able to find out, but I am not certain, because I'm not clear on how, exactly, you are running the tests locally. In the CI test output you showed, the displayed failure occurred in tests of the max
feature and not of others. But I don't know if that's something that usually happens, or if it's just nondeterministic and happened to happen that time. Furthermore, if it does usually happen with max
and not the others that are tested earlier, I don't know if that's because of a difference in the features themselves, or if it's just that running the tests multiple times in succession in the same environment causes them to fail due to some kind of effect of the previous runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked into this further and found some useful things, though not a full solution.
A simple, but strange, workaround
I tried reenabling the affected journey tests on CI on a feature branch, to verify that they fail there, which they do. On a separate throwaway test branch, I made the test-journey
job into a matrix job and ran it not just on ubuntu-latest
but also on macos-latest
. I can get the job to fail on macos-latest
on rare occasion, but it almost always succeeds, while in contrast it always fails on ubuntu-latest
. (I've verified that ubuntu-latest
is picking an Ubuntu 24.04 LTS runner in all these tests, including the one you linked above, so that's not the issue.) This seems to accord with the distinction you've reported.
On the feature branch with the affected journey tests enabled, I then tried removing the rust-cache
step. That fixed it! I don't really know why this is, though I have some ideas, and it is probably a workaround rather than a real fix. Nonetheless, I ran the tests on CI more than 15 times, and they always passed. Although I expect to be able to reach a real fix or at least a better workaround, for now this is simple and works, and I may not have a chance to work on this again for a day or two, so I plan to open a PR [edit: #1725] just with the change of turning off caching. Since the journey tests run in their own job, turning off caching doesn't slow down any non-journey tests. With caching turned off, the test-journey
job usually takes between 6:00 and 6:30 to run, which I think is probably acceptable.
My experiments so far: idea 1
There are two reasons turning off caching may have helped. One is if artifacts from the cache are affecting the behavior. That shouldn't happen, though, since any stale versions of dependencies should automatically not be used. The other reason, though, is timing: without caching, there are longer delays, due to more compilation, not just before any journey tests can run, but between the different runs of the journey tests that are done using different feature sets. Perhaps whatever condition triggers the problem corrects itself, given sufficient time between runs.
Returning to my initial idea about ::1
vs. 127.0.0.1
, in that test run (linked there), as well as in all of the various test runs I've done since that have reproduced the problem, nc
reports it has connected to ::1
for the several tests using git-daemon
that succeed, and reports it has connected to 127.0.0.1
for the test using git-daemon
that fails.
This suggests that, in the failing test, git-daemon
can't bind ::1
and binds 127.0.0.1
instead, or, alternatively, that for some reason, it decides to bind 127.0.0.1
and not ::1
, while gix
still uses ::1
. I don't know why this would happen, since I would think that it should be binding both if they both exist, unless directed otherwise.
Such a situation could happen if a previous run of git-daemon
still has the port, and that's what most signs point to, even separately from the address disparity: the nc
command checks that it can connect, but it does not check that it is connecting to the instance of git-daemon
that was just run. Furthermore, there are two reasons to suspect that a previous git-daemon
instance could still be running:
- The
git
process (started by thegit -c ... daemon ...
command) actually forks before it execsgit-daemon
from itsgit-core
directory and both processes are typically running. (This can be observed withpstree
.) I've verified that this happens on both GNU/Linux and macOS.$!
is the PID of thegit
parent, not thegit-daemon
child. If theEXIT
trap either fails to run or theSIGTERM
signal itskill
command sends to the parentgit
process does not result in thegit-daemon
child process terminating, and if nothing else terminates it, then it will still be running. set -e
will not catch a failure from an asynchronous... &
command that is able to create an asynchronous process, since its status will be successful even if the asynchronous process immediately fails.
However...that doesn't seem to be happening. If I instrument the code with a pre-check using nc
that forces failure if the port is already open, that failure does not occur: the port is closed before each test runs git daemon
. This holds up even if both IPv4 and IPv6 localhost
are checked with separate nc
commands.
And yet...
My experiments so far: idea 2
...it cannot be a coincidence that nc
always reports connecting to ::1
if gix
is about to succeed, and always reports connecting to 127.0.01
if gix
is about to fail. What is gix
connecting to?
It seems, for some reason, that it is connecting to ::1
. I don't know if this is due to change in the behavior of gix
, or if it is instead brought about by timing. This could still even be entirely due the journey test shell scripts and git
behavior, since what it connects to is at least in part related to what interfaces are being listened on. I am not sure.
The reason I think--or, I should say, thought--gix
is treating localhost
to mean ::1
, even when nc
has treated it to mean 127.0.0.1
, is that it looks like the problem went away when I added --listen=::1
to the git daemon
command and modified the nc -z
loop that follows it to connect to ::1
. But I am not sure that was a correct observation, because when I do this now, it is taking forever, even on tests that had worked before with ::1
: somehow, even where nc
had reported connecting to ::1
, it cannot do so when passed ::1
.
Anyway, even if further tweaks along these lines--which I do plan to attempt--does make it work, there will be two problems:
- Viewing things narrowly, it would be nice to keep the journey tests broadly runnable locally, and not everybody has both IPv4 and IPv6 set up, even on the loopback interface. Also, though maybe this less significant, I have found that not all implementations of
nc
accept-6
or even recognize::1
: when I installgnu-netcat
on an Arch Linux system (that does have IPv6 and I canping ::1
), neither works. - Viewing things broadly, the problem is that both the invocation of
git daemon
and the access to it bygix
should, if possible, work on all systems that have at least one of IPv4 and IPv6. If that is not feasible, then the reason, and whatever limitations it imposes, should be documented.
In addition, while it seems to me that these various signs of the IPv4/IPv6 distinction mattering are significant, it might still turn out that this general idea of mine is mistaken--with some other explanation for the coincidence--and that this neither causes nor can be used to fix the problem.
Current status and branch links
There are three relevant branches in my fork. The least interesting is run-ci/macos, where I experimented with running the journey tests on CI on macOS as well as Ubuntu. The other two are interesting. Of them, run-ci/daemon is the most useful because it contains the removal of caching that works around the problem, while run-ci/daemon-next consists of it with more commits trying various things: the things described above, and a bit more. (39a1cc7 is the first commit of run-ci/daemon-next not on run-ci/daemon.)
Of these, only run-ci/daemon would be suitable for merging at this time, so I'll only open a PR for it. [Edit: As noted above, I've opened #1725 for this.] These are the repeated test runs that pass with that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to know, thank you! This means that this PR does contains a change that causes the failures.
On MacOS, I run j journey-tests
with git version 2.39.5 (Apple Git-154). Indeed, if the errors show they only show with the max
feature. It's also true that the failures only happen with the max
future, but always work with the max-pure
and async
versions of the binaries.
max-pure
definitely is the same when it comes to setting up blocking TCP connections to the local server compared to max
.
Also known is that in the failing max
run of the journey tests, the git-daemon
binary comes up and accepts its first (fake) connection, but is down once gix
tries to connect.
After running it 3 times successfully, the fourth time I saw this:
[15435] 07:25:25.095601 pkt-line.c:80 packet: upload-pack> efa596d621559707b2d221f10490959b2decbc6c refs/tags/unannotated
[15435] 07:25:25.095603 pkt-line.c:80 packet: upload-pack> 0000
[15435] 07:25:25.095791 pkt-line.c:80 packet: upload-pack< 0000
[15435] 07:25:25.095794 trace.c:411 performance: 0.001274000 s: git command: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git upload-pack --strict --timeout=0 .
[15209] [15435] Disconnected
/Users/byron/dev/github.com/GitoxideLabs/gitoxide/tests/journey/gix.sh: line 160: kill: (15405) - No such process
error: Recipe `journey-tests` failed on line 204 with exit code 1
gitoxide ( main) +11 -2 [$!] took 3s
❯ ps aux | grep git
byron 15467 0.0 0.0 410059184 192 s002 R+ 7:25AM 0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox --exclude-dir=.venv --exclude-dir=venv git
byron 15209 0.0 0.0 410754192 4288 s002 S 7:25AM 0:00.00 /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git-daemon --verbose --base-path=. --export-all --user-path
byron 15206 0.0 0.0 410736816 4224 s002 S 7:25AM 0:00.01 /Applications/Xcode.app/Contents/Developer/usr/bin/git -c uploadpack.allowrefinwant daemon --verbose --base-path=. --export-all --user-path
gitoxide ( main) +11 -2 [$!]
What happened is that an upload-pack child sub-process was 'stuck' in the prior run, so the current run was unable to bring up a new git daemon
(line 160: kill: (15405) - No such process
.
Upon running the journey test with this patch applied again and letting it finish…
diff --git a/tests/helpers.sh b/tests/helpers.sh
index 2f435a553..0b45e34b5 100644
--- a/tests/helpers.sh
+++ b/tests/helpers.sh
@@ -60,7 +60,16 @@ function small-repo-in-sandbox() {
}
function launch-git-daemon() {
- git -c uploadpack.allowrefinwant daemon --verbose --base-path=. --export-all --user-path &>/dev/null &
+ GIT_TRACE=true \
+ GIT_TRACE_PACK_ACCESS=true \
+ GIT_TRACE_PACKET=true \
+ GIT_TRACE_PACKFILE=true \
+ GIT_TRACE_PERFORMANCE=true \
+ GIT_TRACE_SHALLOW=true \
+ GIT_TRACE_SETUP=true \
+ GIT_CURL_VERBOSE=true \
+ GIT_SSH_COMMAND="ssh -VVV" \
+ git -c uploadpack.allowrefinwant daemon --verbose --base-path=. --export-all --user-path &
daemon_pid=$!
while ! nc -z localhost 9418; do
sleep 0.1
diff --git a/tests/journey.sh b/tests/journey.sh
index 63771e415..c1a110681 100755
--- a/tests/journey.sh
+++ b/tests/journey.sh
@@ -36,5 +36,5 @@ export LC_ALL=C
set-static-git-environment
-source "$root/journey/ein.sh"
+#source "$root/journey/ein.sh"
source "$root/journey/gix.sh"
… I saw this:
[it] generates the correct output in JSON format
07:31:51.918740 exec-cmd.c:139 trace: resolved executable path from Darwin stack: /Applications/Xcode.app/Contents/Developer/usr/bin/git
07:31:51.919058 exec-cmd.c:238 trace: resolved executable dir: /Applications/Xcode.app/Contents/Developer/usr/bin
07:31:51.919414 git.c:750 trace: exec: git-daemon --verbose --base-path=. --export-all --user-path
07:31:51.919423 run-command.c:655 trace: run_command: git-daemon --verbose --base-path=. --export-all --user-path
07:31:51.921568 exec-cmd.c:139 trace: resolved executable path from Darwin stack: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git-daemon
07:31:51.921777 exec-cmd.c:238 trace: resolved executable dir: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core
[16778] Could not bind to ::: Address already in use
[16778] Ready to rumble
07:31:52.021314 run-command.c:655 trace: run_command: REMOTE_ADDR=127.0.0.1 REMOTE_PORT=65349 /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git-daemon --serve --verbose --base-path=. --export-all --user-path
Connection to localhost port 9418 [tcp/git] succeeded!
[with] version 1
07:31:52.023856 exec-cmd.c:139 trace: resolved executable path from Darwin stack: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git-daemon
07:31:52.024359 exec-cmd.c:238 trace: resolved executable dir: /Applications/Xcode.app/Contents/Developer/usr/libexec/git-core
[it] generates the correct output[16781] Connection from 127.0.0.1:65349
fatal: the remote end hung up unexpectedly
[16778] [16781] Disconnected (with error)
- FAIL
$ /Users/byron/dev/github.com/GitoxideLabs/gitoxide/target/debug/gix --no-verbose --config protocol.version=1 remote --name git://localhost/ refs
Expected actual status 1 to be 0
Error: connection failed
Caused by:
0: An IO error occurred when connecting to the server
1: Connection refused (os error 61)
error: Recipe `journey-tests` failed on line 204 with exit code 1
gitoxide ( main) +11 -2 [$!] took 3s
And checking for stray processes yielded nothing:
❯ ps aux | grep git
byron 16899 0.0 0.0 410724160 1536 s002 S+ 7:32AM 0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox --exclude-dir=.venv --exclude-dir=venv git
I think what's happening here is a race condition: the previous run creates a Git daemon which actually fails as the address is already in use. However, it doesn't notice this as it backgrounds the process, and instead tries to connect which succeeds… as it connects to the previous instance of the git daemon
which is still in flight but on the way to shut down.
This is why seemingly it can connect to the Git instance, but is unable to connect right after.
It seemed like an easy fix, just check if there still is someone listening and if so, wait:
while nc -z localhost 9418; do
echo "waiting for previous daemon to shut down"
sleep 1
done
However, this never kicks in as apparently it's totally possible that one cannot connect to a port, but it's also not possible to listen to the same port (yet).
The alternative was to try to bind to the port yourself to see if it works, but that would cause the same issue that this was trying to combat.
One might think that a blank sleep 1
would do it, but think again. It's just like git-daemon
often cannot open a socket, maybe it's E_TRYAGAIN
and it just doesn't care.
Looking at the code it is indeed clear that it doesn't care about the actual error code, so listen to sockets seems flaky at least if the socket was recently in use.
5f11cc7
to
3b40f8e
Compare
3b40f8e
to
9566488
Compare
This is part of investigating the problem of not being able to connect to the git daemon. This problem is described in: GitoxideLabs#1634 (review) The preceding commit, bf2abdb, which this reverts, omitted caching from the `test-journey` CI job, and that somehow avoided the problem, causing all tests to pass. With that change, the failure did not recur even after rerunning the job more than 15 times: https://github.com/EliahKagan/gitoxide/actions/runs/12363538568 That is strange, because no cause related to Rust dependencies is apparent: The Git daemon is not itself provided by or expected to be affected by them, and any stale dependencies from the cache should automatically not be used. However, perhaps the extra *delays*, incurred by not having cached some artifacts and having to rebuild them, allowed the failure-triggering condition to resolve on its own. If so, that would explain this observation. It would also be consistent with how the problem arose with the changes in GitoxideLabs#1634, if a previous delay existed that happened to prevent the bug from being observed and that GitoxideLabs#1634 somehow counteracted. This could be either by a performance improvement, or by shifting the delays to other times during the job (by changing which artifacts were needed for which tests). The `test-journey` job usually takes between 6:00 an 6:30 without caching build artifacts. This is arguably not too bad, so it might be feasible to use that change as a long-term workaround. But if possible the underlying problem should be identified and fixed. So this commit brings back caching to check if it reliably restores the bug. If so, then experiments and other fixes or workarounds can be attempted on top of that.
In each CI run with failing journey tests due to not being able to connect to `git daemon`, there are also passing tests, and in each run that I have observed, including an early run presented in GitoxideLabs#1634 (https://github.com/GitoxideLabs/gitoxide/actions/runs/12359365805/job/34491924319?pr=1634#step:6:1049), all passing journey tests (of those with non-skipped contents) that connect to the git daemon do so after `nc` has reported success connecting to `::1` (IPv6 loclahost), and the failing journey test that attempts to connect to the git daemon does so after `nc` has reported success connecting to `127.0.0.1` (IPv4 localhost). That doesn't clearly elucidate what would be going wrong, and it is possible that the problem is unrelaced, but it looks worthwhile to check into it. At this point, the git daemon is still not told what to listen on. First I just want to see what `nc` reports when checking for both to ensure that nothing is listening on either of them each time `git-daemon` is about to be run.
This is part of investigating the problem of not being able to connect to the git daemon. This problem is described in: GitoxideLabs#1634 (review) The preceding commit, bf2abdb, which this reverts, omitted caching from the `test-journey` CI job, and that somehow avoided the problem, causing all tests to pass. With that change, the failure did not recur even after rerunning the job more than 15 times: https://github.com/EliahKagan/gitoxide/actions/runs/12363538568 That is strange, because no cause related to Rust dependencies is apparent: The Git daemon is not itself provided by or expected to be affected by them, and any stale dependencies from the cache should automatically not be used. However, perhaps the extra *delays*, incurred by not having cached some artifacts and having to rebuild them, allowed the failure-triggering condition to resolve on its own. If so, that would explain this observation. It would also be consistent with how the problem arose with the changes in GitoxideLabs#1634, if a previous delay existed that happened to prevent the bug from being observed and that GitoxideLabs#1634 somehow counteracted. This could be either by a performance improvement, or by shifting the delays to other times during the job (by changing which artifacts were needed for which tests). The `test-journey` job usually takes between 6:00 an 6:30 without caching build artifacts. This is arguably not too bad, so it might be feasible to use that change as a long-term workaround. But if possible the underlying problem should be identified and fixed. So this commit brings back caching to check if it reliably restores the bug. If so, then experiments and other fixes or workarounds can be attempted on top of that.
In each CI run with failing journey tests due to not being able to connect to `git daemon`, there are also passing tests, and in each run that I have observed, including an early run presented in GitoxideLabs#1634 (https://github.com/GitoxideLabs/gitoxide/actions/runs/12359365805/job/34491924319?pr=1634#step:6:1049), all passing journey tests (of those with non-skipped contents) that connect to the git daemon do so after `nc` has reported success connecting to `::1` (IPv6 loclahost), and the failing journey test that attempts to connect to the git daemon does so after `nc` has reported success connecting to `127.0.0.1` (IPv4 localhost). That doesn't clearly elucidate what would be going wrong, and it is possible that the problem is unrelaced, but it looks worthwhile to check into it. At this point, the git daemon is still not told what to listen on. First I just want to see what `nc` reports when checking for both to ensure that nothing is listening on either of them each time `git-daemon` is about to be run.
This reverts commit bf2abdb, the supporting change in GitoxideLabs#1725 that refrained from caching build artifacts from the `test-journey` job, since that no longer needs to be avoided now that `git-daemon` journey tests aren't run on CI. In 9566488 (GitoxideLabs#1634), journey tests that use `git-daemon` were disabled on CI. GitoxideLabs#1725 tried reenabling them (65788e8) and fixing them by no longer caching build artifacts with the `rust-cache` action in the `test-journey` job (bf2abdb). This worked at the time, but the exact reason it worked or how reliable it would be were unknown. It stopped working shortly thereafter in 25b8480 (GitoxideLabs#1726), and those journey tests were re-disabled on CI in d1d3f7c, which reverted 65788e8. However, bf2abdb was not reverted at that time. (The change in d1d3f7c from GitoxideLabs#1726, taken together with this commit, effectively constitute a revert of PR GitoxideLabs#1725.)
Having learned from
gix
it should now be possible to have a command-orientedfetch
that is similar to
gix_protocol::ls_refs()
.Motivated by #1319, as these changes should allow
heartwood
to use most ofgix-protocol
without duplicating a lot of code.
Tasks
fetch()
test-onlyRefMap
transplantedConnection
uses utility transport type to avoid drop impl.gitoxide-core
use new protocol facilities.re-implement the test-onlyfetch
with the command-oriented fetch, maybe provide itas utility in
gix-protocol
even? That would allow the Delegates to remain, butbe specific to this particularly 'simple' way of performing a fetch.
gix
tests are more exhaustive and real.Research
RefMap
Connection
ls-refs
generatesRefMap